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Abstract 

Riemannian geometry has been applied to Brain Computer Inter¬ 
face (BCI) for brain signals classification yielding promising results. 
Studying electroencephalographic (EEG) signals from their associated 
covariance matrices allows a mitigation of common sources of vari¬ 
ability (electronic, electrical, biological) by constructing a representa¬ 
tion which is invariant to these perturbations. While working in Eu¬ 
clidean space with covariance matrices is known to be error-prone, one 
might take advantage of algorithmic advances in information geometry 
and matrix manifold to implement methods for Symmetric Positive- 
Definite (SPD) matrices. This paper proposes a comprehensive review 
of the actual tools of information geometry and how they could be 
applied on covariance matrices of EEG. In practice, covariance ma¬ 
trices should be estimated, thus a thorough study of all estimators is 
conducted on real EEG dataset. As a main contribution, this paper 
proposes an online implementation of a classifier in the Riemannian 
space and its subsequent assessment in Steady-State Visually Evoked 
Potential (SSVEP) experimentations. 
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1 Introduction 


Human-machine interactions without relying on muscular capabilities is pos¬ 
sible with Brain-Computer Interfaces (BCI) [1, 2, 3]. They are the focus of 
a large scientific interest [4, 5, 6, 7, 8, 9], especially those based on electro¬ 
encephalography (EEG) [10, 11]. From the large literature based on the 
BCI competition datasets [12, 13, 14], one can identified the two most chal¬ 
lenging BCI problems: on one hand, the inter-individual variability plagues 
the models and lead to BCl-inefficiency effect [15, 16, 17], on the other 
hand the intra-individual changes calls for the development of online algo¬ 
rithms and adaptive systems following the evolution of the subject’s brain 
waves [18, 19, 20, 21, 22]. To alleviate these variations, several signal pro¬ 
cessing and machine learning techniques have been proposed, such as filter¬ 
ing, regularization or clustering [23, 24, 25] without marking out an obvious 
“best candidate” methodology. 

A common vision is shared by all the most efficient approaches to reduce 
signal variabilities: they are applied on covariance matrices instead of work¬ 
ing in the input signal space. Common Spatial Pattern (CSP) [26, 27, 28, 29], 
which is the most known preprocessing technique in BCI, try to maximize 
the covariance of one class while minimizing the covariance of the other one. 
Similarly, Principal Components Analysis (PCA) [12, 13], also applied for 
spatial filtering in BCI, is based on the estimation of covariance matrices. 
Canonical Correlation Analysis (CCA) [30] is another example of a tech¬ 
nique relying on covariance estimates successfully applied on EEG for spatial 
filtering [31, 21, 24]. Given two sets of signals, CCA aims at finding the pro¬ 
jection space that maximizes their cross-covariance while jointly minimizing 
their covariance. Covariance matrices are also found in classifiers such as 
the Linear Discriminant Analysis (LDA), which is largely used in BCI. In all 
cases, they are handled as elements of an Euclidean space. However, being 
Symmetric and Positive-Definite (SPD), covariance matrices lie on a subset 
of the Euclidean space, with reduced dimensionality and specific properties, 
the Riemannian manifold. Considering covariance matrices in their origi¬ 
nal space would reduce the search area for an optimization problem [32], 
As Riemannian manifolds inherently define a metric, the distance between 
SPD matrices takes into account the space where they lie on; approximating 
it to an Euclidean space introduce inaccuracies and result in ill-conditioned 
matrices. 

Recently, studies have been done to consider covariance matrices ob¬ 
tained from multichannel brain signals in their original space [33, 34]. Co- 
variance matrices are the input features of the BCI system and the classifier 
algorithms rely on Riemannian metric for partitioning the feature space. The 
authors propose to build specific covariance matrices in order to emphasize 
the spatial and temporal information of the multichannel brain signals. The 
outcome of this approach is a simple processing toolchain, which achieve 
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state-of-the-art classification performances. 

This paper introduces an online version of the minimum distance to Rie- 
mannian mean algorithm [34], with an application on Steady-State Visually 
Evoked Potential (SSVEP) recordings. In SSVEP, the subjects concentrate 
on stimuli blinking with fixed frequencies; depending on the focus of their 
attention, brain waves will arise with the same phase and frequency than the 
stimulus chosen by the subject. The proposed online implementation im¬ 
proves the performances in the BCI task, as the parameters of the classifier 
adapt to new, unlabeled samples. Such adaptation takes into account the 
qualitative changes in the brain responses and the acquired signals, caused 
respectively by internal and external factors, e.g. user fatigue and slight 
variations in experimental settings. The proposed online implementation is 
similar to the unsupervised or semi-unsupervised learning scheme proposed 
in [35, 36]; that has the potential of shortening (or even removing) the cal¬ 
ibration phase. The proposed algorithm applies a similar approach to dy¬ 
namic stopping criterion used in [37] to increase the speed of the BCI system. 
This approach allows to dynamically determine the trial length and ensure 
robustness in classification results. When working with covariance matrices, 
a crucial point is to correctly estimate the covariance when the number of 
samples is small or heavily corrupted by noise. Several approaches have 
been proposed to build the covariance matrices, relying on normalization or 
regularization of the sample covariances. To assess the quality of the covari¬ 
ance matrices obtained from EEG samples, a comparative study of these 
estimators is conducted. 

Hence, the contributions of this works are: 

• a comprehensive review of the literature on Riemannian geometry ap¬ 
plied to EEG and time-series, 

• an online and asynchronous classification algorithm for SSVEP-based 
BCI, 

• a thorough analysis of the covariance estimators and their impact on 
tools derived from information geometry. 

The paper is divided as follows: Section 2 reviews application of Rie- 
mannian geometry to machine learning. Section 3 presents concepts of Rie¬ 
mannian geometry relevant to this work and estimators of covariance. In 
Section 4, the proposed classification algorithm for online SSVEP is intro¬ 
duced and the experimental results are presented in Section 5. 

2 State of the Art 

Information geometry provides useful tools for various machine learning and 
optimization problems. In machine learning, SPD matrices have been used 


3 


in various applications where features and data are only considered in the 
Euclidean space. Indeed, covariance matrices lie in the space of SPD matri¬ 
ces which is a subset of the Euclidean space when considered with the scalar 
product. But the same space of SPD matrices, endowed with a differential 
structure, induces a Riemannian manifold. 

Riemannian geometry can improve machine learning algorithms, tak¬ 
ing explicitly into consideration the underlying structure of the considered 
space. Three kind of approaches in the literature uses the data geometry in 
machine learning. The first one relies on the mapping of the Riemannian 
manifold onto an Euclidean vector space. One such mapping, called loga¬ 
rithmic mapping, exist between the manifold and its tangent space, which is 
an Euclidean space, and has been used in classification task for BCI [38, 39]. 
Other kernels have been applied successfully to this end: Stein kernel, Log- 
Euclidean kernels as well as their normalized versions [40]. The main idea 
is to map the input data to a high dimensional feature space, providing a 
rich and hopefully linearly separable representation. The so-called kernel 
trick is to provide a kernel function, which computes an inner product in 
the feature space directly from points lying in the input space, defining a 
Reproducing Kernel Hilbert Space (RKHS). The family of kernels defined 
on the Riemannian manifold allows implementing extension of all kernel- 
based methods, such as SVM, kernel-PCA or kernel £:-means [41]. Apart 
from the kernel approaches, once the data are mapped onto a vector space, 
any machine learning algorithm working in Euclidean space, such as LDA, 
could be applied [34] . 

A second kind of machine learning approach exploit the underlying ge¬ 
ometry of the data. Instead of mapping the data to an Euclidean space, 
either a tangent space or a RKHS, the algorithms are adapted to Rieman¬ 
nian space. For instance, sparse coding algorithm have been adapted to Rie¬ 
mannian manifold, using the geodesic distance to estimate the data point 
and its sparse estimate [42]. Similarly nonlinear dimensionality reduction 
techniques have been adapted to Riemannian manifold, such as Laplacian 
Eigenmaps (LE), Locally Linear Embedding (LLE), and Hessian LLE. This 
adaptation was used to cluster data using their pdfs [43] or covariance ma¬ 
trices [44] as features. Another example is the adaptation of interpolation 
and filtering of data to Riemannian space performed in [45] , where an affine- 
invariant Riemannian metric is also proposed to offer a geodesically com¬ 
plete manifold i.e a manifold with no edge and no singular point that can 
be reached in finite time. 

In the last kind of approach, instead of adapting existing algorithm from 
Euclidean to Riemannian geometry, new algorithms are developed directly 
for Riemannian manifolds. The minimum distance to Riemannian mean 
(MDRM) relies on a Riemannian metric to implement a multi-class classi¬ 
fier and have been applied on EEG. New EEG trials are assigned to the 
class whose average covariance matrix is the closest to the trial covariance 
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matrix [34]. The MDR.M classification can be preceded by a filtering of 
covariance matrices, like in [33] where covariance matrices are filtered with 
LDA component in the tangent space, then brought back to the Riemannian 
space for classification with MDRM. Another example is the Riemannian 
Potato [46] , an unsupervised and adaptive artifact detection method, provid¬ 
ing an online adaptive EEG filtering (i.e outliers removal). Incoming signals 
are rejected if their covariance matrix lies beyond a predefined distance z- 
score from the mean covariance matrix, computed from a sliding window. 
With the same objective of achieving robustness to noise that affect covari¬ 
ance matrices, Riemannian geometry is used to solve divergence functions 
of pdfs [47]. This allows to reformulate the CSP as the maximization of 
the divergence between the distributions of data from two different classes 
corresponding to two cognitive states [48, 49]. Using the beta divergence 
the obtained CSP is robust to outliers in sample covariance matrices and 
this algorithm is successfully applied to EEG filtering for BCI. Riemannian 
metrics are also used for EEG channel selection [50] and the selection of the 
most discriminatory spatial filters in CSP [51]. 

Applications of Riemannian geometry to BCI mentioned thus far are 
focusing on motor imagery (MI) paradigm. In MI experiment, the subject 
is asked to imagine a movement (usually hand, feet or tongue), generating 
Event-Related Synchronization and Desynchronization (ERD/ERS) in pre¬ 
motor brain area. Riemannian BCI is well suited for MI experiment as the 
spatial information linked with synchronization is directly embedded in co- 
variance matrices obtained from multichannel recordings. However, for BCI 
that rely on Evoked Potential such as SSVEP or Event Related Potential 
(ERP), as P300, both frequential and temporal information are needed; the 
spatial covariance matrix does not contain these information. To apply Rie¬ 
mannian geometry to SSVEP and ERP, the sample covariance matrices can 
be defined from a rearrangement of the recorded data. The rearrangement 
is done such that the temporal or frequency information are captured [52], 
With similar motivations, [53] and [54] defined a new Riemannian distance 
between SPD matrices that would take into account a weighting factor on 
matrices. They use this new distance as a dissimilarity between weighted 
matrices of power spectral density to classify EEG into different sleep state 
by k -nearest neighbors. 

3 Covariance Matrices and their Geometry 

This section presents some formal definitions for the information geometry 
concepts used in this paper. The link with the covariance matrices is expli¬ 
cated in Section 3.2, along with the covariance estimators proposed in the 
literature. 
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3.1 Riemannian Manifold 

An m-dimensional manifold M is a Hausdorff space for which every point 
has a neighborhood that is homeomorphic to an open subset of M m [55, 56]. 
When a tangent space is defined at each point, M is called a differential 
manifold. A geodesic 7 is the shortest smooth curve between two points, Si 
and S 2 . The tangent space T^M at point S is the vector space spanned by 
the tangent vectors of all geodesics on M passing through S. A Riemannian 
manifold is a manifold endowed with an inner product defined on the tangent 
space, which varies smoothly from point to point. 

For the rest of this paper, we will restrict to the analysis of the manifold 
Me of the C x C symmetric positive definite matrices, defined as: 

M c = {S G M CxC : S = S T and x T £x > 0,Vx G M c \0} . 

The tangent space T^Mc is identified to the Euclidean space of symmetric 
matrices: 

Sc = {6 £ M CxC : 0 = 0 T } . 

The dimension of the manifold Me, and its tangent space T^Mc, is m = 
C(C+ l)/ 2 . 

The mapping from a point 0j of the tangent space to the manifold is 
called the exponential mapping Exp s (0j): T^Mc —> Me and is defined as: 

Exp s (0j) = £2 Exp (£~2 0 jX - 5 )E 2 . ( 1 ) 

Its inverse mapping, from the manifold to the tangent space is the logarith¬ 
mic mapping Log s (£j): Me Tj^Mc and is defined as: 

Log^(£i) = L°g(£ 5 £j£ 5 )£3 . [ 2 ) 

Exp(-) and Log(-) are the the matrix exponential and logarithm respectively. 
The computation of these operators is straightforward for SPD matrices of 
Me- They are obtained from their eigenvalue decomposition (EVD): 

£ = [/diag(Ai,...,A c )C/ T , 

Exp(£) = U diag(log(Ai),..., log(Ac)) U J , 

Log(£) = U diag(exp(Ai),..., exp(A c )) U J , 

where Ai,..., A 7 are the eigenvalues and U the matrix of eigenvectors of £. 
As any SPD matrix can be diagonalized with strictly positive eigenvalues, 
Log(-) is always defined. Similarly the square root £2 is obtained as: 

£5 = U diag(A^,..., A^,) U J , 

and is unique. The same goes for £“ 2 . 
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The tangent vector of the geodesic 7 (t) between Si and £ 2 , where 7 ( 0 ) = 
Si and 7 ( 1 ) = S 2 is defined as v = S 1 S 2 = Log Sl (S 2 ). A Riemannian 
distance between Si and S 2 can thus be defined as [57]: 

c 1V 2 

Y lo § 2 » ( 3 ) 

,c= 1 

where A c , c = 1 are the eigenvalues of S]” 1 S 2 . From Eq. (3), the 

geometric mean of / points £,; on the manifold, i = 1,..., I, can be defined 
as the point that minimizes the sum of squared distances to all £*: 

1 

yu(Si,... ,Sj) = arg min V]<5 2 (Sj, S) . (4) 

zeM c “ 

1=1 

This geometric mean has no closed form, and can be computed iteratively 

[58]- 

3.2 Covariance Matrix Estimation 

Let x n G M^, n = 1,..., N, denotes a sample of a multichannel EEG trial 
recorded on C electrodes. N is the trial length. Let X € W CxN be the EEG 
trial such as X = [aq,..., a:#]- Under the hypothesis that all N samples x n 
are randomly drawn from a distribution, it follows that A is a variable of 
random vectors and its expected vector is oj = E{X} [59]. The covariance 
matrix of the random vector X is defined by £ = E{(X — ui)(X — w) T } 
and is unknown, thus an estimate £ should be computed. The choice of 
the appropriate estimator is crucial to verify that the obtained covariance 
matrices fulfill the following properties: they should be accurate, SPD and 
well-conditioned. The last property requires that the ratio between the 
maximum and minimum singular value is not too large. Moreover, to ensure 
the computational stability of the algorithm, the estimator should provide 
full-rank matrices, and its inversion should not amplify estimation errors. 


<*(£!, £2) = ||Log(£r 1 £ 2 )||F = 


3.2.1 Sample Covariance Matrix Estimator 

The most usual estimator is the empirical sample covariance matrix (SCM), 
defined as: 

1 N 

L'scm — ^ ' ( X n x)(x n 

"=1 (5) 

= f ijv - — t N tl r 1 x T , 

N - 1 V N J 

where x is the sample mean vector x = jj Y2n=i Xn% -^ n ma t r ix notation, 
I at is the N x N identity matrix and l,v is the vector [1,..., 1], The SCM 
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is often normalized as: 


V _ C A (x n - x){x n - X-) T 

^nscm ~ W 

n=l 

(j 2 Cn is the inter-channel variance at time n. Other normalization techniques 
could be used. 

This estimation is fast and computationally simple. However when C ~ 
N, the SCM is not a good estimator of the true covariance. In the case 
C > N, the SCM is not even full rank. 


3.2.2 Shrinkage Covariance Matrix Estimators 

To overcome these shortcomings, the shrinkage estimators have been devel¬ 
oped as a weighted combination of the SCM and a target covariance matrix, 
which is often chosen to be close to the identity matrix, i.e. resulting from 
almost independent variables of unit variance. 

^shrink = + (1 K)S scm , (7) 

where 0 ^ k < 1. This estimator provides a regularized covariance that 
outperforms the empirical E scm for small sample size, that is C ~ N. The 
shrinkage estimator has the same eigenvectors as the SCM, but the extreme 
eigenvalues are modified i.e. the estimator is shrunk or elongated toward 
the average. 

The different shrinkage estimators differ in their definition of the target 
covariance matrix T. Ledoit and Wolf [60] have proposed T = vie, with 

v = Tr(E scm Ic). Blankertz [61] defines T also as vie but with v = Ir ^ cm ^ , 
m being the dimension of the covariance matrices space. Schafer proposes 
several ways of defining T depending on the observed E scm [62]. 


3.2.3 Fixed Point Covariance Matrix Estimator 

The Fixed Point Covariance Matrix [63] is based on the maximum likelihood 
estimator l which is a solution to the following equation: 


C 




(x n - x){x n - z) T 
^ \ (x n - xyi~ x {x n - x) t 


( 8 ) 


As there is no closed form expression to Eq. (8), it can be written as a 
function of £: g(£) = Ef p . g admits a single fixed point £*, where g(£*) = £*, 
which is a solution to Eq. (8). Using £q := E n scm as the initial value of £, it 
is solved recursively as £t —> £*■ 

t->-oo 
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4 Online Adaptation 

of the Riemannian Classifier 

Concerning Riemannian classification of SSVEP, the offline methodology 
has been explained in [52]. In this paper, we propose an online classifier for 
SSVEP, composed of an offline training step and an online and asynchronous 
test step. This analysis is performed for each subject independently. 


4.1 Offline Riemannian Classification 


The proposed classifier relies on the Minimum Distance to Riemannian Mean 
(MDRM) introduced in [33] and extended in [52] for possible applications 
on SSVEP signals. Let consider an experimental SSVEP setup with F 
stimulus blinking at F different frequencies. It is a multiclass classification 
with K = F + 1 classes: one class per stimulus and one resting state class. 
The covariance matrices are estimated from a modified version of the input 
signal X : 

V| r eq j 


V £ R CxN -> 


G M 


FCxN 


(9) 


X, 


freq F 


where Vf req/ is the input signal X band-pass filtered around frequency freq^, 
/ = 1,... ,F. Thus the resulting covariance matrix E belongs to R FCxFC . 
Henceforth, all EEG signals will be considered as filtered and modified by 
Eq. (9). 

From I labelled training trials {JQ } i=1 recorded per subject, K centers 
(k) 

of classes Ejt ' are estimated using Algorithm 1. When an unlabelled test 

(k) 

trial Y is given, it is classified as belonging to the class whose center E^ ' is 
the closest to the trial’s covariance matrix (Algorithm 2, step 2). 


Algorithm 1 Offline Estimation of Riemannian Centers of Classes 
Inputs: X t G R FCxN , for i = 1,a set of labelled trials. 
Inputs: I(k), a set of indices of trials belonging to class k. 

Output: E q ' ) , k = 1,..., K, centers of classes. 

1 : Compute covariance matrices Sj of X{ 

2: for k = 1 to K do 

3: Ej, fc) = /i(Ei : * G 1(h)) , Eq. (4) 

4: end 

5: return sj^ 
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Algorithm 2 Minimum Distance to Riemannian Mean 
(k) 

Inputs: , K centers of classes from Algorithm 1. 

Input: Y 6 W FCxN , an unlabelled test trial. 

Output: k*, the predicted label of Y. 

1: Compute covariance matrix E of Y 
2: k* = argmin/j S(E, E^) 

3: return k* 


Algorithm 3 Curve-based Online Classification 

Inputs: hyper-parameters w, An, D , and d. 

(k) 

Inputs: , A: = centers of classes from Algorithm 1 (offline 

training). 

Inputs: Online EEG recording x(n). 

Output: k(n), online predicted class. 

1: d = 1 

2: for n = w to AT step An 

3: Epoch Xd, Eq. (10), and classify it with Algorithm 2 

4: if d > D 

5: Find the most recurrent class in /C = k* £ j^ d y. k = argmax fc p(k) , 

Eq. (11) _ 

6: if p{k) > d 

7: Compute 5j. , Eq. (12) 

8: if Sj. < 0 

9: return k = k 

10: end 

11: end 

12: end 

13: d = d + 1 

14: end 
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4.2 Curve-based Online Classification 

In offline synchronous BCI paradigm, cue onset are used as reference for the 
localization of a brain response, e.g. for an evoked potential. Nonetheless 
most of the BCI applications are online and asynchronous; cue onsets are 
not known, thus designing online version of BCI algorithms, even well docu¬ 
mented ones, is not a trivial task. The approach introduced here identifies a 
period (he. interval) in the online EEG x £ M. FCx X , where A f is the number 
of recorded samples, associated with a high probability (above threshold) of 
observing an SSVEP at a specific frequency, as illustrated in Algorithm 3. 

To locate this interval, we focus on the last D recorded EEG overlap¬ 
ping epochs [Xi € R FCxw }, with the set of indices J{d) = d — D + 
1 ,,d — 1 ,d] where d is the index of the current epoch Xj in the online 
recording %(n). Epochs have size w, and the interval between two consecu¬ 
tive epochs is An, with w > An: 


Xd = x(n -w,...,n) 


( 10 ) 


To obtain the first D epochs Xj e at least w + (D — 1) An samples of 
X should be recorded (step 4). 

The classification outputs k hj(d) obtained by applying Algorithm 2 
(step 3) on Xj g are stored in a vector 1C. The class that occurs the 
most in K. (step 5), with an occurrence probability p(k ) above a defined 
threshold '0 , is considered to be the class, denoted k, of the ongoing EEG 
recording x(n). Vector p is defined as: 


P(k) 

and 


#{k 




= k] 


D 


, for k = 1,..., K, 


k = arg max fc p{k) . 


( 11 ) 


If d is not reached within the last D epochs, the classification output 
is held back, and the sliding process continues until 0 is reached. In the 
last D epochs, once a class k has been identified, a curve direction criterion 
is introduced to enforce the robustness of the result. For class k to be 
validated, this criterion requires that the direction taken by the displacement 
of covariance matrices be toward the center of class . Hence 5%, 

the sum of gradients {i.e. differentials) of the curve made by distances from 
T<j£j(d) t° should be negative (step 8): 


%= E 


X6~ k (j) 

Xj 


with hti) = 


= 6 k(d) ~ hti -1) < o 

j=d-D-\- 2 

S(£j, Ef) 


( 12 ) 


£f = i<HE,-,sW) 
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The occurrence criterion is inspired by the dynamic stopping of [37]; 
there is no fixed trial length for classification. The occurrence criterion 
ensures that the detected user intention is unaffected by any short time dis¬ 
turbances due to noise or subject’s inattention, as presented in Algorithm 3. 
This approach offers a good compromise to obtain robust results within a 
short and flexible time. 

5 Experimental Validation 

Covariance matrix estimators, Algorithms 2 and 3 are applied to SSVEP 
signals for offline and online analysis. This section presents the analysis and 
results obtained. 

5.1 Data Description 

The signals are recorded from 12 subjects during an SSVEP experiment. 
EEG are measured on C = 8 channels: Oz, O i, O 2 , POz , PO 3 , PO 4 , -PO 7 , 
and PO 3 . The ground and the reference electrodes were placed respectively 
on Fz and the right hear mastoid. The acquisition rate is T s = 256 Hz on 
a gTec MobiLab amp. The subjects are presented with F = 3 visual target 
stimuli blinking respectively at freq = 13Hz, 17Hz and 21Hz. It is a K = 4 
classes BCI setup made of the F = 3 stimulus classes and one resting class 
(no-SSVEP). In a session, which lasts 5 minutes, 32 trials are recorded: 8 
for each visual stimulus and 8 for the resting class. The number of sessions 
recorded per subject varies from 2 to 5. Thus the longest EEG recorded for 
a single subject is 25 minutes or 160 trials. The trial length is 6 seconds, 
that is N = 6 x T s = 1536 samples. Since data are rearranged as detailed in 
(9), trials X G M > FCxN , where FC = 24 corresponding to 8 channels times 3 
stimulus frequencies. For each subject, a test set is made of 32 trials while 
the remaining trials (which might vary from 32 to 128) make up for the 
training set. 

5.2 Covariance Estimators Comparison 

In this section, the effectiveness of covariance matrix estimators is evaluated 
for SSVEP signals. The evaluation is done in terms of classification accuracy, 
information transfer rate (ITR), and integrated discrimination improvement 
(IDI), obtained by each estimator (see Section 3.2) while using the offline 
MDR.M classifier. The different conditioning of covariance matrices are also 
investigated. 

A bootstrapping with 1000 replications is performed to assess the per¬ 
formances of each estimator. Estimators are compared on 10 trial lengths 
t G {0.5,1.0,... 5.0} seconds, as these are known to affect the estimators 
performance. Here N G {128, 256,..., 1280} is computed as N = t x T s . 
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Figure 1: Comparison of covariance estimators in terms of classification accuracy and 
Information Transfer Rate (ITR) obtained with MDRM with increasing EEC trial length. 
For each trial length, the average accuracy and ITR across all subjects and across all 
replications is shown. Bars indicate the standard deviation, (a) Accuracy in percentage, 
(b) ITR in bits/min. 


Figure la shows the classification accuracies of each estimator. The 
increase in the accuracy can be attributed to the fact that the relevant pat¬ 
terns in EEG accumulate with the trial length, producing better estimation 
of the covariance matrices. This is known to be particularly true for the 
SCM estimator and it could be seen in Figure la. It appears that shrinkage 
estimators (especially Ledoit and Schafer) are less affected by the reduction 
of epoch sizes than the other estimators. This is a direct consequence of 
the regularization between the sample covariance matrices and the targeted 
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(expected) covariance matrix of independent variables. The information of 
Figure la is congruent with Figure lb; ITR increases as the trial length 
decreases. The transfer rate is higher for shrinkage estimators with shorter 
epoch sizes, while maintaining correct accuracy. 

For computational purposes, it is important to look at the matrix con¬ 
ditioning. Figure 2a shows the ratio C between the largest and smallest 
eigenvalues: in well-conditioned matrices, C is small. Shrinkage estimators 
offer better conditioned matrices while the SCM, NSCM, and Fixed Point 
matrices are ill-conditioned below 2 seconds of trial length, and may result 
in singular matrices. 

On Figure 2b, the Integrated Discrimination Improvement (IDI), as de¬ 
fined in [64] , is computed for the different estimators and trial lengths. The 
SCM is used as a reference for improvement, as this is the most popular es¬ 
timator in the literature. Negative IDI means a deterioration in the method 
discrimination ability. It is clear that shrinkage estimators increase the dis¬ 
crimination power of the classifier. However, despite being more complex 
than the SCM, the NSCM and the Fixed Point estimators decrease the dis¬ 
crimination ability of classifiers. From Figures la and 2b, it is apparent 
that the difference in performance between the SCM and shrinkage estima¬ 
tors reduces as the trial length increases. The simplicity of the SCM plays a 
favorable role: it is an attractive method for longer trials. However for trial 
lengths compatible with a reasonable ITR for man-machine interaction (< 5 
sec), the shrinkage estimators offer better performances, with the Schafer 
estimator being the best. The p-values under the hypothesis that there is 
no improvement (i.e. IDI = 0) from one estimator to another are all inferior 
to 10” 47 , hence the improvement is significant. It should be noted that the 
estimation of covariance matrices is a trade-off between the quality of the 
estimate and the computation time required; this should be considered for 
real-time processing. 

5.3 Effect of Outliers on Centers Estimation 

(k) 

Outliers can affect the offline training of the K centers of class E^ ' by Algo¬ 
rithm 1. Figure 3 shows representations of training covariance matrices Ej 
in the tangent space (0*), projected at the mean of all training trials (black 
star). To obtain this visualization, the first two principal components of a 
PCA applied on {0j}[ =1 are selected. In Figures 3b and 3d, a Riemannian 
potato [46] is applied with the Riemannian mean of all training matrices 
as a fixed reference. Riemannian potato only removes highly corrupted sig¬ 
nals, allowing to remove outlier trials that negatively impact the estimation 
of class centers. Applying this filter in an offline evaluation improves es¬ 
timation of class centers for the subject with the lowest BCI performance. 
Figure 3b illustrates how center estimates are affected by the filtering. Bad 
estimates of a class center will result in poor clustering. 
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Trial length in seconds (N/Ts) 

(a) 



Trial length in seconds (N/Ts) 

(b) 

Figure 2: (a) Covariance matrices condition expressed as the ratio C between largest and 
smallest eigenvalues for the different covariance estimators. The comparison is done for 
increasing EEC trial length, (b) Integrated discrimination improvement brought to the 
classification task by various estimators along varying trail length. The indicated IDI 
values are multiplied by 10 2 . £ scm is used as baseline. 


It should be noted that the Riemannian potato may have a negative 
effect if the different classes are well separated. The Riemannian potato 
could reject trials that are close to the average of their class but that are 
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far from the mean of all trials. Thus, “legitimate” trials could be discarded, 
even if they have a positive impact on the classification results. Avoiding this 
central bias is a possible improvement of the Riemannian potato algorithm. 




(a) 


(b) 




(c) (d) 

Figure 3: Scatter plot of covariance matrices for all trials mapped on the tangent space. 
The distance between each trial covariance matrix E, and its Riemannian mean class 
E^ fc) is shown as connection line. The black star represents the Riemannian mean of all 
trials. Subject with lowest BCI performance, (a) before and (b) after Riemannian potato 
filtering. Subject with highest BCI performance, (c) before and (d) after Riemannian 
potato filtering. 


5.4 Classification Results 

In this section the classification performances are presented. It is divided 
into two parts: first, the relevance of identifying the latency between cue 
onset and SSVEP response is shown through the evaluation of the impact 
of SSVEP delays considered for offline classification. Then, performances of 
offline and online algorithms are compared and analysed. 
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5.4.1 SSVEP Response Time Delay for Offline 


Figure 4 shows time signals of one trial from each class, measured at Oz, 
for the subject with highest BCI performance and for the subject with low¬ 
est BCI performance. The figure displays the raw signal filtered at the 3 
different stimulus frequency, each with a different color: 13Hz in blue, 17Hz 
in red, 21Hz in green. It is visible that synchronization for current trial 
frequency appears only 2 seconds after cue onset tq = Os on the x-axis 
of the plots. In fact, before to + 2 seconds the signal is still synchronized 
with the previous trial frequencies. In the signals from the subject with the 
highest BCI performance, synchronization at the trial stimulus frequency is 
predominant and permanent after To+ 2, which eases the classification task. 
Moreover, for the resting trial (no SSVEP), the power in all stimuli frequen¬ 
cies is sensibly decreased. However, the poor predictive capability for the 
subject with lowest BCI performance could be explained by the fact that the 
predominance of the trial stimulus frequency is not always evident and not 
permanent. Also, during the resting trial, the power in stimuli frequencies 
is not significantly decreased. 

Similar observation is made from Figure 5 where the time-frequency 
representations of EEG trials per class are plotted for the subjects with the 
highest and lowest BCI performances. The time-frequency representation 
is obtained with a STFT on rectangular windowing. Each plot has 4 rows, 
one for each class, displaying the power spectral density (averaged over all 
trials) in the 13Hz, 17Hz, and 21Hz frequency bands as a function of the 
trial time. An important increase in average classification accuracy (almost 
10 %) could be obtained by taking the trial from 2 seconds after cue onset. 
It is therefore crucial to consider the latency between the trial’s cue onset 
and the actual synchronization of SSVEP at stimulus frequency. 

5.4.2 Online Classification Results 

In the offline synchronous analysis, the confident window for classification 
was found to be at least 2 seconds after the cue onset (tq + 2). In an online 
asynchronous experiment, there is no cue onset, and the delay before SSVEP 
synchronization might defer from a trial to another and from a subject to 
another. To locate the trust EEG region for the classification, D and i? are 
set respectively to 5 and 0.7 through cross validation. The epoch size is 
w = 3.6 seconds, and the overlap is An = 0.2 second. 

The observation of Figure 6 provides a visualization of the principle 
guiding the online implementation of Equation (12). This figure shows the 
trajectory on the tangent space taken by covariance matrices during a 4- 
class SSVEP experiment, and how they are classified epoch by epoch. It 
can be seen (encircled in Figure 6a) that a change in the SSVEP stimulus 
might not be detected instantaneously by the classifier. The trials are er- 
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(a) Example of trials for the subject with the highest BCI performance 



(b) Example of trials for the subject with the lowest BCI performance 


Figure 4: Signal amplitude at each stimulus frequency, showing synchronization of EEG 
with respect to time (seconds). The raw signal of the trial measured on Oz is band filtered 
using a Butterworth of order 8 at each stimulus frequency and the resulting signals are 
shown in blue (dark grey) , green (grey), and red (light grey) for the same signal filtered 
respectively at 13, 17, and 21 Hz. The cue onset to at time 0 on the x-axis is shown with a 
vertical discontinued line. 4 trials are shown, one for each class. Signals from the subjects, 
(a) with highest BCI performance and (b) with lowest BCI performance. 


roneously attributed with confidence to the previous class. The proposed 
online algorithm, described in Algorithm 3, mitigates this issue and increases 
the classification accuracy as shown in Table 1. In the ‘Offline’ column are 
the classification accuracies for each subject obtained by the application of 
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(a) Time-frequency diagrams for the subject with the highest BCI per¬ 
formance 



(b) Time-frequency diagrams for the subject with the lowest BCI per¬ 
formance 

Figure 5: Time-frequency representation of EEC from various classes showing synchro¬ 
nization at in frequency bands (indicated on y axis) on both side of the cue onset. The 
power of synchronization is shown in the color bar. Dark colors show strong synchro¬ 
nization. 4 trials (top to bottom) are represented, one for each class. Signals from the 
subjects, (a) with highest BCI performance and (b) with lowest BCI performance. 


the state-the-art MDRM as described in Algorithm 2, which classifies full 
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(a) 



(b) 

Figure 6: Covariance matrices trajectory during a 4-class SSVEP online recording. The 
circles represent class centers. The triangles mark the beginning in the experiment of a 
new trial whose class is indicated by the triangle’s color. 6a shows the first 7 trials. The 
first 3 trials are from the resting class, the remaining are respectively class 13Hz, 17Hz, 
and 21Hz. 6b shows the entire recording. Data are from the subject with the highest BCI 
performance. 


trials. Column ‘Offline opt.’ shows the results of this approach improved 
by taking into consideration the SSVEP response time delay as found in 
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section 5.4.1: trials are taken 2 second after cue onset. Columns ‘Online’ 
and ‘Online + contain the performance of the online algorithm as de¬ 
scribed in Algorithm 3. The ‘Online’ column shows the results of the online 
algorithm without the curve direction criterion (i.e., without steps 6 to 11), 
and ‘Online + 5^ shows the improvement brought by this criterion. The 
performances are in terms of average classification accuracy (acc(%)) and 
average time taken into the trial before classification (delay(s)). For the 
offline approaches, classification is only done at the end of the trial i.e. 6 
seconds after cue onset. 

The curve direction criterion increases the rejection of epochs that could be 
wrongly classified, it thus significantly increases the classification accuracy 
of the online algorithm (74.73% to 79.13%), while increasing the delay (1.006 
s to 1.239 s) before classification. When compared to the state-of-the-art 
MDRM, the curve-based classification yields better results both in terms 
of classification accuracy and delay before classification, i.e. from 71.51% 
to 79.13% and from 6 s (or longer, depending on trial length N) to 1.239 
s. There is a small drop in classification accuracy from optimized offline 
(81.27%) to online (79.13%) algorithms. However, classification outputs are 
reached faster with the online algorithm. Moreover, the online algorithm 
can be applied in both synchronous and asynchronous paradigms, while 
the offline algorithms are limited to synchronous paradigms which provide 
strongly limited user interaction. 



Offline 

Offline opt. 

Online 

Online + 5j. 

Sub. 

acc(%) 

acc(%) 

acc(%) 

delay(s) 

acc(%) 

delay(s) 

1 

62.50 

73.44 

70.31 

1.081 

71.37 

1.338 

2 

67.19 

79.69 

78.12 

0.950 

75.00 

1.303 

3 

85.94 

85.94 

87.50 

0.950 

89.06 

0.972 

4 

78.12 

87.50 

73.44 

1.016 

75.00 

1.056 

5 

70.31 

68.75 

70.31 

1.013 

70.31 

1.319 

6 

79.69 

85.94 

87.50 

0.997 

87.35 

1.338 

7 

78.12 

88.54 

84.38 

1.004 

85.42 

1.144 

8 

89.06 

92.19 

85.94 

0.947 

89.06 

1.072 

9 

62.50 

70.31 

67.19 

0.922 

75.00 

1.244 

10 

60.94 

80.47 

63.28 

1.044 

69.53 

1.264 

11 

43.75 

65.62 

59.38 

1.109 

68.75 

1.375 

12 

80.00 

96.88 

69.37 

1.036 

93.75 

1.443 

mean 

71.51 

81.27 

74.73 

1.006 

79.13 

1.239 


Table 1: Average classification accuracies (acc(%)) achieved using the state-of-the-art 
MDRM (Offline), the optimized version of the MDRM (Offline opt.), the online algorithm 
without the curve direction criterion (Online), and the complete online algorithm (Online 
+ Sj,). For the online algorithms, the delay between trial start and classification is also 
provided (delay(s)). 
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6 Conclusion 


This work investigated the efficiency of Riemannian geometry when deal¬ 
ing with covariance matrices as classification features. A novel algorithm, 
relying on the curve direction in the space of covariance EEG signals, was 
introduced and applied on a SSVEP classification task for a 4-class brain 
computer interface. Different covariance matrix estimators were investigated 
and their robustness was assessed on multichannel SSVEP signals to ensure 
that the obtained matrices are accurate estimates of data covariance, are 
well conditioned, and verify the positive-definiteness property. The Schafer 
shrinkage estimator was found to be the best as it yielded the highest clas¬ 
sification accuracy with the MDRM algorithm. 

The introduced online algorithm enhances the stability of the BCI sys¬ 
tem, balancing between the classification speed and the prediction accuracy. 
Classification confidence over a number of epochs mitigates the short term 
perturbations in the experimental conditions and the attentional variations 
of the subject, wile considering the curve direction overcomes the misclassifi- 
cation of EEG trials that are still synchronized with past stimuli frequencies 
at classification time. 

The Riemannian geometric approach and the determination of the Rie¬ 
mannian mean of classes, offers a powerful tool to develop robust algorithms. 
A possible extension to the proposed algorithm is to include a dynamic adap¬ 
tation of the Riemannian mean for each class. This dynamic adaptation was 
implemented, using the latest successfully classified covariance matrices to 
replace to oldest ones in the computation of the mean. It is not reported 
in this paper as there was no significant improvement in the results, which 
could be attributed to the relatively short duration of the EEG recording 
sessions, which were no longer than 25 min. The Riemannian means of 
classes do not change much over the sessions, but for longer recording time 
this adaptation could bring significant improvements. 
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